Geospatial Analysis2 - Emerging Hot Spot Analysis

Author

Imran Ibrahim

Published

March 17, 2024

Modified

March 27, 2024

Overview

In this page, I will be exploring the codes for the plots in our Geospatial Analysis module of our Shiny Application. Specifically, I will be plotting the Emerging Hot Spot Map.

Emerging Hot Spot Analysis: sfdep methods

Emerging Hot Spot Analysis (EHSA) is a spatio-temporal analysis method for revealing and describing how hot spot and cold spot areas evolve over time.

The analysis consist of four main steps:

Building a space-time cube,
Calculating Getis-Ord local Gi* statistic for each bin by using an FDR correction,
Evaluating these hot and cold spot trends by using Mann-Kendall trend test,
Categorising each study area location by referring to the resultant trend z-score and p-value for each location with data, and with the hot spot z-score and p-value for each bin.

Loading R Packages and Data Prep

pacman::p_load(tidyverse, dplyr , 
               sf, lubridate,plotly,
               tmap, spdep, sfdep)

Shapes files for Myanmar admin2 levels

mmr_shp_mimu_2 <-  st_read(dsn = "data/geospatial3",  
                  layer = "mmr_polbnda_adm2_250k_mimu")

Reading layer `mmr_polbnda_adm2_250k_mimu' from data source 
  `C:\teoose\VAA-GroupProject(Netlify)\Prototype\Geospatial Analysis\data\geospatial3' 
  using driver `ESRI Shapefile'
Simple feature collection with 80 features and 7 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 92.1721 ymin: 9.696844 xmax: 101.17 ymax: 28.54554
Geodetic CRS:  WGS 84

Data Wrangle for quarterly data

As per project requirements, we will sync the time frame for this analysis to be the same as our previous LISA analysis. Therefore, we will set up the data set to be for 2021-2023, and in quarterly periods

I won’t repeat the data prep steps again, as this has already been done in previous prototype page. I will read in the previously prepared quarterly data for 2021-2023 instead.

Events_2 <- read_csv("data/df1_complete.csv")

Since this data set has been filled up for missing values, using tidyr::complete() , I can proceed to use the standard spacetime constructor ie spacetime()

Creating a Time Series Cube

In the code chunk below, spacetime() of sfdep is used to create an spatio-temporal cube.

First, loc_col identifier needs to be the same name for both data and shape file.

Events_2 <- Events_2 %>%
        filter(event_type == "Battles") %>%
        rename(DT=admin2) %>%
        select(-event_type, -year, -Fatalities)

Quarterly_spt <- spacetime(Events_2, mmr_shp_mimu_2,
                      .loc_col = "DT",
                      .time_col = "quarter")

is_spacetime_cube(Quarterly_spt)

[1] TRUE

Computing Gi*

Next, we will compute the local Gi* statistics.

Deriving the spatial weights

The code below will be used to identify neighbors and to derive an inverse distance weights.

Quarterly_nb <- Quarterly_spt %>%
  activate("geometry") %>%
  mutate(nb = include_self(st_contiguity(geometry)),
         wt = st_inverse_distance(nb, geometry,
                                  scale = 1,
                                  alpha = 1),
         .before = 1) %>%
  set_nbs("nb") %>%
  set_wts("wt")

Note

activate() of dplyr package is used to activate the geometry context
mutate() of dplyr package is used to create two new columns nb and wt.
Then we will activate the data context again and copy over the nb and wt columns to each time-slice using set_nbs() and set_wts()
- row order is very important so do not rearrange the observations after using set_nbs() or set_wts().

Note that the data sets now have neighbors and weights for each time-slice.

head(Quarterly_nb)

# A tibble: 6 × 5
  quarter DT        Incidents nb        wt       
    <dbl> <chr>         <dbl> <list>    <list>   
1   20211 Hinthada          0 <int [7]> <dbl [7]>
2   20211 Labutta           0 <int [3]> <dbl [3]>
3   20211 Maubin            0 <int [7]> <dbl [7]>
4   20211 Myaungmya         0 <int [5]> <dbl [5]>
5   20211 Pathein           0 <int [5]> <dbl [5]>
6   20211 Pyapon            0 <int [5]> <dbl [5]>

Computing Gi*

We can use these new columns to manually calculate the local Gi* for each location. We can do this by grouping by year and using local_gstar_perm() of sfdep package. After which, we use unnest() to unnest gi_star column of the newly created gi_starts data.frame.

#for Quarterly admin 2
gi_stars3 <- Quarterly_nb %>% 
  group_by(quarter) %>% 
  mutate(gi_star = local_gstar_perm(
    Incidents, nb, wt)) %>% 
  tidyr::unnest(gi_star)

gi_stars3

# A tibble: 960 × 15
# Groups:   quarter [12]
   quarter DT      Incidents nb    wt    gi_star cluster    e_gi  var_gi std_dev
     <dbl> <chr>       <dbl> <lis> <lis>   <dbl> <fct>     <dbl>   <dbl>   <dbl>
 1   20211 Hintha…         0 <int> <dbl>  -0.938 Low     0.00935 1.49e-4 -0.767 
 2   20211 Labutta         0 <int> <dbl>  -0.645 Low     0.00634 1.96e-4 -0.453 
 3   20211 Maubin          0 <int> <dbl>  -0.938 Low     0.00960 1.54e-4 -0.772 
 4   20211 Myaung…         0 <int> <dbl>  -0.801 Low     0.00886 1.82e-4 -0.657 
 5   20211 Pathein         0 <int> <dbl>  -0.801 Low     0.00819 1.69e-4 -0.630 
 6   20211 Pyapon          0 <int> <dbl>  -0.801 Low     0.00887 1.71e-4 -0.679 
 7   20211 Bago            1 <int> <dbl>   0.321 Low     0.0105  1.37e-4  0.532 
 8   20211 Taungoo         8 <int> <dbl>   0.432 High    0.0206  1.02e-4 -0.263 
 9   20211 Pyay            0 <int> <dbl>  -0.337 Low     0.00932 1.65e-4 -0.128 
10   20211 Thayar…         0 <int> <dbl>  -0.270 Low     0.00912 1.52e-4 -0.0395
# ℹ 950 more rows
# ℹ 5 more variables: p_value <dbl>, p_sim <dbl>, p_folded_sim <dbl>,
#   skewness <dbl>, kurtosis <dbl>

Mann-Kendall Test

With these Gi* measures we can then evaluate each location for a trend using the Mann-Kendall test.

The code chunk below uses Hinthada region.

cbg3 <- gi_stars3 %>% 
  ungroup() %>% 
  filter(DT == "Hinthada") |> 
  select(DT, quarter, gi_star)

Next, we plot the result by using ggplotly() of plotly package.

Hinthada district quarterly

p3 <- ggplot(data = cbg3, 
       aes(x = quarter, 
           y = gi_star)) +
  geom_line() +
  theme_light()

ggplotly(p3)

Mann Kendall test for Hinthada district-quarterly

cbg3 %>%
  summarise(mk = list(
    unclass(
      Kendall::MannKendall(gi_star)))) %>% 
  tidyr::unnest_wider(mk)

# A tibble: 1 × 5
     tau    sl     S     D  varS
   <dbl> <dbl> <dbl> <dbl> <dbl>
1 -0.242 0.304   -16  66.0  213.

Values of Mann Kendall test.

`tau`	Kendall’s tau statistic
`sl`	two-sided p-value
`S`	Kendall Score
`D`	denominator, tau=S/D
`varS`	variance of S

We can replicate this for each location by using group_by() of dplyr package.

Admin 2 districts-quarterly

ehsa3 <- gi_stars3 %>%
  group_by(DT) %>%
  summarise(mk = list(
    unclass(
      Kendall::MannKendall(gi_star)))) %>%
  tidyr::unnest_wider(mk)

ehsa3

# A tibble: 80 × 6
   DT                              tau     sl     S     D  varS
   <chr>                         <dbl>  <dbl> <dbl> <dbl> <dbl>
 1 Bago                        -0.0606 0.837     -4  66.0  213.
 2 Bawlake                     -0.333  0.150    -22  66.0  213.
 3 Bhamo                       -0.303  0.193    -20  66.0  213.
 4 Danu Self-Administered Zone -0.394  0.0865   -26  66.0  213.
 5 Dawei                        0.515  0.0236    34  66.0  213.
 6 Det Khi Na                  -0.212  0.373    -14  66.0  213.
 7 Falam                       -0.333  0.150    -22  66.0  213.
 8 Gangaw                       0.121  0.631      8  66.0  213.
 9 Hakha                       -0.0303 0.945     -2  66.0  213.
10 Hinthada                    -0.242  0.304    -16  66.0  213.
# ℹ 70 more rows

Arrange to show significant emerging hot/cold spots

Admin 2 districts-quarterly

emerging3 <- ehsa3 %>% 
  arrange(sl, abs(tau)) %>% 
  slice(1:5)

emerging3

# A tibble: 5 × 6
  DT               tau      sl     S     D  varS
  <chr>          <dbl>   <dbl> <dbl> <dbl> <dbl>
1 Mandalay       0.667 0.00319    44  66.0  213.
2 Maubin        -0.667 0.00319   -44  66.0  213.
3 Pyapon        -0.667 0.00319   -44  66.0  213.
4 Pyinoolwin     0.636 0.00493    42  66.0  213.
5 Yangon (West) -0.636 0.00493   -42  66.0  213.

Performing Emerging Hotspot Analysis

Lastly, we will perform EHSA analysis by using emerging_hotspot_analysis() of sfdep package. It takes a spacetime object x (i.e quarterly_spt), and the quoted name of the variable of interest (i.e. Incidents) for .var argument.

The k argument is used to specify the number of time lags which is set to 1 by default.

Lastly, nsim map numbers of simulation to be performed.

ehsa3 <- emerging_hotspot_analysis(
  x = Quarterly_spt, 
  .var = "Incidents", 
  k = 1, 
  nsim = 99
)

Visualising the distribution of EHSA classes

In the code chunk below, ggplot2 functions is used to reveal the distribution of EHSA classes as a bar chart.

Admin2 districts - quarterly

#| fig-width: 12
#| fig-height: 7
#| column: body-outset-right

ggplot(data = ehsa3,
       aes(x = classification)) +
  geom_bar()

Visualising EHSA

In this section, we will visualise the geographic distribution EHSA classes. However, before we can do so, we need to join (mmr_shp_mimu2 & ehsa3) together by using the code chunk below.

mmr3_ehsa <- mmr_shp_mimu_2 %>%
  left_join(ehsa3,
            by = join_by(DT == location))

Next, tmap functions will be used to plot a categorical choropleth map by using the code chunk below.

#| fig-width: 10
#| fig-height: 7
#| column: body-outset-right

ehsa_sig3 <- mmr3_ehsa  %>%
  filter(p_value < 0.05)

tmap_mode("plot")

tm_shape(mmr3_ehsa) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
tm_shape(ehsa_sig3) +
  tm_fill("classification") + 
  tm_borders(alpha = 0.4)

References

Main reference: Kam, T.S. (2024). Emerging Hot Spot Analysis: sfdep methods